25 research outputs found
Statistical inference with anchored Bayesian mixture of regressions models: A case study analysis of allometric data
We present a case study in which we use a mixture of regressions model to
improve on an ill-fitting simple linear regression model relating log brain
mass to log body mass for 100 placental mammalian species. The slope of this
regression model is of particular scientific interest because it corresponds to
a constant that governs a hypothesized allometric power law relating brain mass
to body mass. A specific line of investigation is to determine whether the
regression parameters vary across subgroups of related species.
We model these data using an anchored Bayesian mixture of regressions model,
which modifies the standard Bayesian Gaussian mixture by pre-assigning small
subsets of observations to given mixture components with probability one. These
observations (called anchor points) break the relabeling invariance typical of
exchangeable model specifications (the so-called label-switching problem). A
careful choice of which observations to pre-classify to which mixture
components is key to the specification of a well-fitting anchor model.
In the article we compare three strategies for the selection of anchor
points. The first assumes that the underlying mixture of regressions model
holds and assigns anchor points to different components to maximize the
information about their labeling. The second makes no assumption about the
relationship between x and y and instead identifies anchor points using a
bivariate Gaussian mixture model. The third strategy begins with the assumption
that there is only one mixture regression component and identifies anchor
points that are representative of a clustering structure based on case-deletion
importance sampling weights. We compare the performance of the three strategies
on the allometric data set and use auxiliary taxonomic information about the
species to evaluate the model-based classifications estimated from these
models
Bayesian Synthesis: Combining subjective analyses, with an application to ozone data
Bayesian model averaging enables one to combine the disparate predictions of
a number of models in a coherent fashion, leading to superior predictive
performance. The improvement in performance arises from averaging models that
make different predictions. In this work, we tap into perhaps the biggest
driver of different predictions---different analysts---in order to gain the
full benefits of model averaging. In a standard implementation of our method,
several data analysts work independently on portions of a data set, eliciting
separate models which are eventually updated and combined through a specific
weighting method. We call this modeling procedure Bayesian Synthesis. The
methodology helps to alleviate concerns about the sizable gap between the
foundational underpinnings of the Bayesian paradigm and the practice of
Bayesian statistics. In experimental work we show that human modeling has
predictive performance superior to that of many automatic modeling techniques,
including AIC, BIC, Smoothing Splines, CART, Bagged CART, Bayes CART, BMA and
LARS, and only slightly inferior to that of BART. We also show that Bayesian
Synthesis further improves predictive performance. Additionally, we examine the
predictive performance of a simple average across analysts, which we dub Convex
Synthesis, and find that it also produces an improvement.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS444 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Rediscovering a little known fact about the t-test and the F-test: Algebraic, Geometric, Distributional and Graphical Considerations
We discuss the role that the null hypothesis should play in the construction
of a test statistic used to make a decision about that hypothesis. To construct
the test statistic for a point null hypothesis about a binomial proportion, a
common recommendation is to act as if the null hypothesis is true. We argue
that, on the surface, the one-sample t-test of a point null hypothesis about a
Gaussian population mean does not appear to follow the recommendation. We show
how simple algebraic manipulations of the usual t-statistic lead to an
equivalent test procedure consistent with the recommendation. We provide
geometric intuition regarding this equivalence and we consider extensions to
testing nested hypotheses in Gaussian linear models. We discuss an application
to graphical residual diagnostics where the form of the test statistic makes a
practical difference. By examining the formulation of the test statistic from
multiple perspectives in this familiar example, we provide simple, concrete
illustrations of some important issues that can guide the formulation of
effective solutions to more complex statistical problems.Comment: 22 pages, 5 figure
Case-deletion importance sampling estimators: Central limit theorems and related results
Case-deleted analysis is a popular method for evaluating the influence of a
subset of cases on inference. The use of Monte Carlo estimation strategies in
complicated Bayesian settings leads naturally to the use of importance sampling
techniques to assess the divergence between full-data and case-deleted
posteriors and to provide estimates under the case-deleted posteriors. However,
the dependability of the importance sampling estimators depends critically on
the variability of the case-deleted weights. We provide theoretical results
concerning the assessment of the dependability of case-deleted importance
sampling estimators in several Bayesian models. In particular, these results
allow us to establish whether or not the estimators satisfy a central limit
theorem. Because the conditions we derive are of a simple analytical nature,
the assessment of the dependability of the estimators can be verified routinely
before estimation is performed. We illustrate the use of the results in several
examples.Comment: Published in at http://dx.doi.org/10.1214/08-EJS259 the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Asymptotics of Lower Dimensional Zero-Density Regions
Topological data analysis (TDA) allows us to explore the topological features
of a dataset. Among topological features, lower dimensional ones have recently
drawn the attention of practitioners in mathematics and statistics due to their
potential to aid the discovery of low dimensional structure in a data set.
However, lower dimensional features are usually challenging to detect from a
probabilistic perspective.
In this paper, lower dimensional topological features occurring as
zero-density regions of density functions are introduced and thoroughly
investigated. Specifically, we consider sequences of coverings for the support
of a density function in which the coverings are comprised of balls with
shrinking radii. We show that, when these coverings satisfy certain sufficient
conditions as the sample size goes to infinity, we can detect lower
dimensional, zero-density regions with increasingly higher probability while
guarding against false detection. We supplement the theoretical developments
with the discussion of simulated experiments that elucidate the behavior of the
methodology for different choices of the tuning parameters that govern the
construction of the covering sequences and characterize the asymptotic results.Comment: 28 page
Vestibular Stimulation for ADHD: Randomized Controlled Trial of Comprehensive Motion Apparatus
Objective:
This research evaluates effects of vestibular stimulation by Comprehensive Motion Apparatus
(CMA) in ADHD.
Method:
Children ages 6 to 12 (48 boys, 5 girls) with ADHD were randomized to thrice-weekly 30-min
treatments for 12 weeks with CMA, stimulating otoliths and semicircular canals, or a single-blind control of
equal duration and intensity, each treatment followed by a 20-min typing tutorial.
Results:
In intent-to-treat analysis (n = 50), primary outcome improved significantly in both groups (p =
.0001, d = 1.09 to 1.30), but treatment difference not significant (p = .7). Control children regressed by
follow-up (difference p = .034, d = 0.65), but overall difference was not significant (p = .13, d = .47). No
measure showed significant treatment differences at treatment end, but one did at follow-up. Children with
IQ-achievement discrepancy ≥ 1 SD showed significantly more CMA advantage on three measures.
Conclusion:
This study illustrates the importance of a credible control condition of equal duration and intensity
in trials of novel treatments. CMA treatment cannot be recommended for combined-type ADHD without
learning disorder
SARS-CoV-2 susceptibility and COVID-19 disease severity are associated with genetic variants affecting gene expression in a variety of tissues
Variability in SARS-CoV-2 susceptibility and COVID-19 disease severity between individuals is partly due to
genetic factors. Here, we identify 4 genomic loci with suggestive associations for SARS-CoV-2 susceptibility
and 19 for COVID-19 disease severity. Four of these 23 loci likely have an ethnicity-specific component.
Genome-wide association study (GWAS) signals in 11 loci colocalize with expression quantitative trait loci
(eQTLs) associated with the expression of 20 genes in 62 tissues/cell types (range: 1:43 tissues/gene),
including lung, brain, heart, muscle, and skin as well as the digestive system and immune system. We perform
genetic fine mapping to compute 99% credible SNP sets, which identify 10 GWAS loci that have eight or fewer
SNPs in the credible set, including three loci with one single likely causal SNP. Our study suggests that the
diverse symptoms and disease severity of COVID-19 observed between individuals is associated with variants across the genome, affecting gene expression levels in a wide variety of tissue types